99 research outputs found
A Bioinformatics Approach for Detecting Repetitive Nested Motifs using Pattern Matching
The identification of nested motifs in genomic sequences is a complex computational problem. The detection of these patterns is important to allow discovery of transposable element (TE) insertions, incomplete reverse transcripts, deletions, and/or mutations. Here, we designed a de novo strategy for detecting patterns that represent nested motifs based on exhaustive searches for pairs of motifs and combinatorial pattern analysis. These patterns can be grouped into three categories: motifs within other motifs, motifs flanked by other motifs, and motifs of large size. Our methodology, applied to genomic sequences from the plant species Aegilops tauschii and Oryza sativa, revealed that it is possible to find putative nested TEs by detecting these three types of patterns. The results were validated though BLAST alignments, which revealed the efficacy and usefulness of the new method, which we call Mamushka.Fil: Romero, José Rodolfo. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Centro de Recursos Naturales Renovables de la Zona Semiárida. Universidad Nacional del Sur. Centro de Recursos Naturales Renovables de la Zona Semiárida; ArgentinaFil: Carballido, Jessica Andrea. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Instituto de Cs. E IngenierÃa de la Computacion; ArgentinaFil: Garbus, Ingrid. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Centro de Recursos Naturales Renovables de la Zona Semiárida. Universidad Nacional del Sur. Centro de Recursos Naturales Renovables de la Zona Semiárida; ArgentinaFil: Echenique, Carmen Viviana. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Centro de Recursos Naturales Renovables de la Zona Semiárida. Universidad Nacional del Sur. Centro de Recursos Naturales Renovables de la Zona Semiárida; ArgentinaFil: Ponzoni, Ignacio. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Instituto de Cs. E IngenierÃa de la Computacion; Argentin
Neural-based approaches to overcome feature selection and applicability domain in drug-related property prediction
In the fields of pharmaceutical research and biomedical sciences, QSAR modeling is an established approach during drug discovery for prediction of biological activity of drug candidates. Yet, QSAR modeling poses a series of open challenges. First, chemical compounds are represented on a high-dimensional space and thus feature selection is typically applied, although this task entails a challenging combinatorial problem with potential loss of information. Second, the definition of the applicability domain of a QSAR model is a desirable aspect to determine the reliability of predictions on unseen chemicals, which is often difficult to assess due to the extent of the chemical space. Finally, interpretability of these models is also a critical issue for drug designers. The purpose of this work is to thoroughly assess the application of neural-based methods and recent advances deep learning for QSAR modeling. We hypothesize that neural-based methods can overcome the need to perform a descriptor selection phase. We developed three QSAR models based on neural networks for prediction of relevant chemical and biomedical properties that, in the absence of any feature selection step, can outperform the state-of-the-art models for such properties. We also implemented an embedded applicability domain technique based on network output probabilities that proved to be effective; its application improved the predictive performance of the model. Finally, we proposed the use of a post hoc feature analysis technique based on an aggregation of network weights, which enabled effective detection of relevant features in the model.Fil: Sabando, MarÃa Virginia. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Instituto de Ciencias e IngenierÃa de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e IngenierÃa de la Computación. Instituto de Ciencias e IngenierÃa de la Computación; ArgentinaFil: Ponzoni, Ignacio. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Instituto de Ciencias e IngenierÃa de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e IngenierÃa de la Computación. Instituto de Ciencias e IngenierÃa de la Computación; ArgentinaFil: Soto, Axel Juan. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Instituto de Ciencias e IngenierÃa de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e IngenierÃa de la Computación. Instituto de Ciencias e IngenierÃa de la Computación; Argentin
Discovering time-lagged rules from microarray data using gene profile classifiers
Background: Gene regulatory networks have an essential role in every process of life. In this regard, the amount of genome-wide time series data is becoming increasingly available, providing the opportunity to discover the time-delayed gene regulatory networks that govern the majority of these molecular processes.Results: This paper aims at reconstructing gene regulatory networks from multiple genome-wide microarray time series datasets. In this sense, a new model-free algorithm called GRNCOP2 (Gene Regulatory Network inference by Combinatorial OPtimization 2), which is a significant evolution of the GRNCOP algorithm, was developed using combinatorial optimization of gene profile classifiers. The method is capable of inferring potential time-delay relationships with any span of time between genes from various time series datasets given as input. The proposed algorithm was applied to time series data composed of twenty yeast genes that are highly relevant for the cell-cycle study, and the results were compared against several related approaches. The outcomes have shown that GRNCOP2 outperforms the contrasted methods in terms of the proposed metrics, and that the results are consistent with previous biological knowledge. Additionally, a genome-wide study on multiple publicly available time series data was performed. In this case, the experimentation has exhibited the soundness and scalability of the new method which inferred highly-related statistically-significant gene associations.Conclusions: A novel method for inferring time-delayed gene regulatory networks from genome-wide time series datasets is proposed in this paper. The method was carefully validated with several publicly available data sets. The results have demonstrated that the algorithm constitutes a usable model-free approach capable of predicting meaningful relationships between genes, revealing the time-trends of gene regulation. © 2011 Gallo et al; licensee BioMed Central Ltd.Fil: Gallo, Cristian Andrés. Consejo Nacional de Investigaciones CientÃficas y Técnicas; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e IngenierÃa de la Computación; ArgentinaFil: Carballido, Jessica Andrea. Consejo Nacional de Investigaciones CientÃficas y Técnicas; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e IngenierÃa de la Computación; ArgentinaFil: Ponzoni, Ignacio. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Planta Piloto de IngenierÃa QuÃmica. Universidad Nacional del Sur. Planta Piloto de IngenierÃa QuÃmica; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e IngenierÃa de la Computación; Argentin
Algoritmos Paralelos Distribuidos para Búsquedas en Profundidad sobre Grafos
En este artÃculo se presenta un algoritmo paralelo distribuido descentralizado para realizar búsquedas en profundidad de caminos sobre grafos. El método se basa en un nueva arquitectura paralelo distribuida, propuesta en este trabajo, en la cual se distribuyen las tareas de cómputo sobre tres tipos de nodos de procesamiento: el Master, los Supervisors y los Workers. Básicamente, el Master organiza la distribución de los distintos subespacios de búsqueda entre los Supervisores. Cada Supervisor encomienda la exploración de los subcaminos correspondientes a su subespacio a diferentes Workers que están a su cargo. Cada Worker efectúa la exploración de una parte del espacio de búsqueda y le envÃa a su Supervisor cada uno de los subcaminos hallados. Por último, el Supervisor se encarga de recombinar sus subcaminos con los subcaminos almacenados por otros Supervisores. El nuevo algoritmo fue implementado en lenguaje C utilizando la librerÃa de pasaje de mensajes PVM y su desempeño fue evaluado en términos de eficiencia y speed-up.A descentralized parallel-distributed algorithm to carry out depth-first searches along graphs is presented. The method is based on a new parallel-distributed architecture that is proposed in this article. In this formulation the computing tasks are distributed among three kinds of nodes: the Master, the Supervisors and the Workers. The Master organizes the distribution of the various search subspaces among the Supervisors. In turn, each Supervisor delegates the exploration of the subpaths inside the assigned subspace to the Workers under its control. So, each Worker explores a given part of the search space, sending its Supervisor information about the subpaths it could find. Finally, the Supervisor has to recombine its own subpaths with those stored by the other Supervisors. The new algorithm was implemented in C using the PVM messagepassage library and its performance was evaluated in terms of speed-up and efficiency.Fil: Fapitalle, Federico. Universidad Nacional del Sur; Departamento de Ciencias de la Computación; ArgentinaFil: Vazquez, Gustavo Esteban. Universidad Nacional del Sur; Departamento de Ciencias de la Computación; Argentina. Consejo Nacional de Investigaciones CientÃficas y Técnicas; ArgentinaFil: Ponzoni, Ignacio. Universidad Nacional del Sur; Departamento de Ciencias de la Computación; Argentina. Consejo Nacional de Investigaciones CientÃficas y Técnicas; ArgentinaFil: Brignole, Nélida Beatriz. Universidad Nacional del Sur; Departamento de Ciencias de la Computación; Argentina. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Planta Piloto de IngenierÃa QuÃmica. Universidad Nacional del Sur. Planta Piloto de IngenierÃa QuÃmica; Argentin
On Artificial Gene Regulatory Networks
Gene regulatory networks (GRNs) represent dependencies between genes and their products during protein synthesis at the molecular level. At the present there exist many inference methods that infer GRNs form observed data. However, gene expression data sets have in general considerable noise that make understanding and learning even simple regulatory patterns difficult. Also, there is no well-known method to test the accuracy of inferred GRNs. Given these drawbacks, characterizing the effectiveness of different techniques to uncover gene networks remains a challenge. The development of artificial GRNs with known biological features of expression complexity, diversity and interconnectivities provides a more controlled means of investigating the appropriateness of those techniques. In this work we introduce this problem in terms of machine learning and present a review of the main formalisms that have been usedSociedad Argentina de Informática e Investigación Operativ
Biclustering in data mining using a memetic multi-objective evolutionary algorithm
In this paper, a new memetic strategy that integrates a multi-objective evolutionary algorithm (the SPEA2) with a local search technique for data mining is presented. The algorithm explores a Term Frequency-Inverse Document Frequency (TF-IDF) data matrix in order to find biclusters that fulfill several objectives. The case of study was a dataset corresponding to the Reuters-21578 corpus. Our algorithm performed satisfactorily, finding biclusters that have large size and coherent values, yielding to undeniably promising outcomes. Nonetheless, more experiments with data from other corpus are necessary, thus leading to more concluding resultsWorkshop de Agentes y Sistemas Inteligentes (WASI)Red de Universidades con Carreras en Informática (RedUNCI
An Evolutionary Algorithm for Automatic Recommendation of Clustering Methods and its Parameters
One of the main problems being faced at the time of performing data clustering consists in the deteremination of the best clustering method together with defining the ideal amount (k) of groups in which these data should be separated. In this paper, a preliminary approximation of a clustering recommender method is presented which, starting from a set of standardized data, suggests the best clustering strategy and also proposes an advisable k value. For this aim, the algorithm considers four indices for evaluating the final structure of clusters: Dunn, Silhouette, Widest Gap and Entropy. The prototype is implemented as a Genetic Algorithm in which individuals are possible configurations of the methods and their parameters. In this first prototype, the algorithm suggests between four partitioning methods namely K-means, PAM, CLARA and, Fanny. Also, the best set of parameters to execute the suggested method is obtained. The prototype was developed in an R environment, and its findings could be corroborated as consistent when compared with a combination of results provided by other methods with similar objectives. The idea of this prototype is to serve as the initial basis for a more complex framework that also incorporates the reduction of matrices with vast numbers of rows.Fil: Carballido, Jessica Andrea. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Instituto de Ciencias e IngenierÃa de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e IngenierÃa de la Computación. Instituto de Ciencias e IngenierÃa de la Computación; ArgentinaFil: Latini, Macarena AnahÃ. Universidad Nacional del Sur; ArgentinaFil: Ponzoni, Ignacio. Universidad Nacional del Sur; Argentina. Consejo Nacional de Investigaciones CientÃficas y Técnicas; ArgentinaFil: Cecchini, RocÃo Luján. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Instituto de Ciencias e IngenierÃa de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e IngenierÃa de la Computación. Instituto de Ciencias e IngenierÃa de la Computación; Argentin
Using Molecular Embeddings in QSAR Modeling: Does it Make a Difference?
With the consolidation of deep learning in drug discovery, several novel
algorithms for learning molecular representations have been proposed. Despite
the interest of the community in developing new methods for learning molecular
embeddings and their theoretical benefits, comparing molecular embeddings with
each other and with traditional representations is not straightforward, which
in turn hinders the process of choosing a suitable representation for QSAR
modeling. A reason behind this issue is the difficulty of conducting a fair and
thorough comparison of the different existing embedding approaches, which
requires numerous experiments on various datasets and training scenarios. To
close this gap, we reviewed the literature on methods for molecular embeddings
and reproduced three unsupervised and two supervised molecular embedding
techniques recently proposed in the literature. We compared these five methods
concerning their performance in QSAR scenarios using different classification
and regression datasets. We also compared these representations to traditional
molecular representations, namely molecular descriptors and fingerprints. As
opposed to the expected outcome, our experimental setup consisting of over
25,000 trained models and statistical tests revealed that the predictive
performance using molecular embeddings did not significantly surpass that of
traditional representations. While supervised embeddings yielded competitive
results compared to those using traditional molecular representations,
unsupervised embeddings tended to perform worse than traditional
representations. Our results highlight the need for conducting a careful
comparison and analysis of the different embedding techniques prior to using
them in drug design tasks, and motivate a discussion about the potential of
molecular embeddings in computer-aided drug design
PolyMaS: Nowe oprogramowanie do generowania makrocząsteczek polimerów o dużej masie cząsteczkowej z powtarzalnych jednostek strukturalnych
The Polymer Maker SMILES-based (PolyMaS) software was used to generate linear macromolecules from the repeating structural units (SRU) of polymers without limiting their length and molar mass. The SRU input is stored in the SMILES code available on the Internet. PolyMaS makes head-tail junctions to the desired length of the macromolecule.Oprogramowanie Polymer Maker SMILES-based (PolyMaS) zastosowano do generowania liniowych makroczÄ…steczek z powtarzalnych jednostek strukturalnych (SRU) polimerów, bez ograniczania ich dÅ‚ugoÅ›ci i masy molowej. Dane wejÅ›ciowe SRU sÄ… zapisane w dostÄ™pnym w Internecie kodzie SMILES. PolyMaS wykonuje poÅ‚Ä…czenia gÅ‚owa-ogon do żądanej dÅ‚ugoÅ›ci makroczÄ…steczki.Fil: Schustik, Santiago. Provincia de Buenos Aires. Gobernación. Comisión de Investigaciones CientÃficas; Argentina. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Planta Piloto de IngenierÃa QuÃmica. Universidad Nacional del Sur. Planta Piloto de IngenierÃa QuÃmica; ArgentinaFil: Cravero, Fiorella. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Planta Piloto de IngenierÃa QuÃmica. Universidad Nacional del Sur. Planta Piloto de IngenierÃa QuÃmica; ArgentinaFil: MartÃnez, MarÃa Jimena. Universidad Nacional del Centro de la Provincia de Buenos Aires; Argentina. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - Tandil. Instituto Superior de IngenierÃa del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de IngenierÃa del Software; ArgentinaFil: Ponzoni, Ignacio. Universidad Nacional del Sur. Departamento de Ciencias e IngenierÃa de la Computación; Argentina. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Instituto de Ciencias e IngenierÃa de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e IngenierÃa de la Computación. Instituto de Ciencias e IngenierÃa de la Computación; ArgentinaFil: Diaz, Monica Fatima. Universidad Nacional del Sur. Departamento de IngenierÃa QuÃmica; Argentina. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Planta Piloto de IngenierÃa QuÃmica. Universidad Nacional del Sur. Planta Piloto de IngenierÃa QuÃmica; Argentin
pdAGMO para configuración inicial de sensores en procesos industriales
En este trabajo se presenta una implementación paralelo-distribuida de un algoritmo genético multiobjetivo (pdAGMO), desarrollado para efectuar la selección de la configuración inicial de sensores en el diseño de instrumentación de plantas de procesos. El pdAGMO fue diseñado empleando el modelo evolutivo de islas y el paradigma masterworker, mientras que para su implementación se empleó la librerÃa de pasaje de mensajes PVM (Parallel Virtual Machine). El desempeño del pdAGMO fue evaluado a través de su aplicación a un caso de estudio industrial correspondiente a una planta de producción de amonÃaco. Los resultados alcanzados son muy satisfactorios en términos de speed-up, eficiencia y calidad del diseño de instrumentación.Fil: Asteasuain, Fernando. Universidad Nacional del Sur. Departamento de Ciencias e IngenierÃa de la Computación. Instituto de Ciencias e IngenierÃa de la Computación; Argentina. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca; ArgentinaFil: Carballido, Jessica Andrea. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Planta Piloto de IngenierÃa QuÃmica. Universidad Nacional del Sur. Planta Piloto de IngenierÃa QuÃmica; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e IngenierÃa de la Computación. Instituto de Ciencias e IngenierÃa de la Computación; ArgentinaFil: Vazquez, Gustavo Esteban. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Planta Piloto de IngenierÃa QuÃmica. Universidad Nacional del Sur. Planta Piloto de IngenierÃa QuÃmica; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e IngenierÃa de la Computación. Instituto de Ciencias e IngenierÃa de la Computación; ArgentinaFil: Ponzoni, Ignacio. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Planta Piloto de IngenierÃa QuÃmica. Universidad Nacional del Sur. Planta Piloto de IngenierÃa QuÃmica; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e IngenierÃa de la Computación. Instituto de Ciencias e IngenierÃa de la Computación; ArgentinaFil: Brignole, Nélida Beatriz. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Planta Piloto de IngenierÃa QuÃmica. Universidad Nacional del Sur. Planta Piloto de IngenierÃa QuÃmica; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e IngenierÃa de la Computación. Instituto de Ciencias e IngenierÃa de la Computación; Argentin
- …